5 research outputs found

    Consistent Second-Order Conic Integer Programming for Learning Bayesian Networks

    Full text link
    Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear "big-MM" constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches

    Mixed Integer Quadratic Optimization for Learning Directed Acyclic Graphs from Continuous Data

    No full text
    Thesis (Ph.D.)--University of Washington, 2019The study of probabilistic graphical models (PGMs) is an essential topic in statistics and machine learning fields. Bayesian networks (BNs), arguably one of the most central classes of PGMs, is frequently used to represent causal relations among a set of random variables in complex systems. A Bayesian network (BN) is a PGM that consists of a labeled directed acyclic graph (DAG) in which the vertices in the vertex set correspond to random variables (nodes), and the edge set prescribe a decomposition of the joint probability distribution of nodes such that the value of any node is a probabilistic function of the values of the nodes which are its parents in the DAG. The edge set encodes Markov conditions on the nodes in the sense that each node is conditionally independent of its non-descendents given its parents. While statistical properties of BNs from continuous data have been extensively studied, the development of efficient computational tools for learning an optimal DAG remains an open challenge. The goal is to learn a DAG structure that maximizes a score function. One such score metric is the posteriori probability of the DAG structure given data. Learning DAGs from observational data is a computationally difficult task, because the number of possible DAGs scales superexponentially with the number of nodes. In this work, we propose novel discrete optimization formulations for learning DAGs for continuous variables. Learning DAG from continuous data can be cast as a mixed-integer quadratic programming (MIQP) with the objective function as the penalized negative log-likelihood (PNL) and an L0 regularization penalty subject to linear constraints. There are two key challenges: (i) imposing a set of constraints to remove cycles from a directed graph, (ii) enforcing tight bounds on the semi-continuous optimization variables corresponding to the arc weights. We tackle the first challenge by presenting a way to remove cycles which results in a new MIQP formulation with linear constraints, referred to as a layered network (LN). We establish that LN is a compact formulation and the objective value of its continuous relaxation is as tight as stronger but larger formulations under a mild condition. An additional benefit of the LN formulation is that it effectively incorporates a prior structural knowledge (super-structure) in order to reduce the set of possible candidate DAGs. Computational results indicate that the proposed formulation outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only L1 regularization, especially in the presence of a sparse super-structure To model semi-continuous variables, a common practice is to use a standard “big-M constraint” in a MIQP. This is commonly modeled using a standard “big-M constraint” in the associated mixed-integer program. However, this strategy leads to a poor continuous relaxation because there is no natural upper bound for the arc weights. To circumvent this deficiency, we present a mixed-integer second-order cone program (MISOCP), which has tighter continuous relaxation bounds than the existing formulations based on big-M constraints – including the LN formulation. We show the promising performance of the MISOCP in terms of reducing the optimality gap when compared to the best existing optimization formulations. The performance of each formulation depends on the size and tightness of its continuous relaxation. This work highlights that the best formulation applies LN constraints to remove cycles to keep the size of the optimization problem small while using conic constraints to tighten the semi-continuous variables

    Lagrangian relaxation heuristics for the uncapacitated single-source multi-product facility location problem

    Get PDF
    Facility location problem is one of the strategic logistical drivers within the supply chain which is a hard to solve optimization problem. In this study, we focus on the uncapacitated single-source multi-product production/distribution facility location problem with the presence of set-up cost. To efficiently tackle this decision problem, two Lagrangian-based heuristics are proposed one of which incorporates integer cuts to strengthen the formulation. Local search operators are also embedded within these methods to improve the upper bounds as the search progresses. Three sets of instances with various characteristics are generated and used to evaluate the performance of the proposed algorithms. Encouraging results are obtained when assessed against an ILP formulation using CPLEX. The latter is used for generating optimal solutions for small size instances and also as a means for producing upper and lower bounds for larger ones when restricted by a limited amount of execution time
    corecore